-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Add a fast path for lowering trivial consts #148040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Add a fast path for lowering trivial consts
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (4c15d20): comparison URL. Overall result: ❌✅ regressions and improvements - BENCHMARK(S) FAILEDBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never ❗ ❗ ❗ ❗ ❗
❗ ❗ ❗ ❗ ❗ Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.2%, secondary 2.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -1.5%, secondary -0.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.6%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 476.496s -> 475.064s (-0.30%) |
This comment has been minimized.
This comment has been minimized.
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add a fast path for lowering trivial consts
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (4931b5e): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.8%, secondary -1.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -9.2%, secondary -1.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.7%, secondary 0.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 474.337s -> 474.199s (-0.03%) |
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Add a fast path for lowering trivial consts
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (8907237): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -3.7%, secondary 2.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -7.5%, secondary -3.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -1.7%, secondary -3.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 474.979s -> 473.593s (-0.29%) |
| | DefKind::AnonConst | ||
| ) && trivial_const(&body).is_some() | ||
| { | ||
| return tcx.alloc_steal_mir(body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we building the Mir for trivial consts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial assessment indicated that most of the savings were from what we do after building MIR, and detecting trivial consts on THIR looked hard.
I think that pushing this detection earlier in compilation would make sense as a later extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely. Pls drop that as a comment right here
|
Even the instruction count regressions (non-incremental) seem to be an improvement in time measured. I don't think this PR needs more performance tuning, as it's somewhat expected that incremental may do more work now due to the extra query, but often still is an improvement |
|
I agree that the performance is pretty good. Also, all of my attempts are reducing the perf overhead have been completely ineffectual. The latest report looks better than the one before it because I let the optimization apply to more DefKinds. I'm cleaning up the code so that it integrates a bit better in ctfe, which will probably make Ralf happier and also make this easier to extend. |
13c516e to
775da71
Compare
|
r? oli-obk |
|
|
| matches!(tcx.def_kind(def), DefKind::AssocConst | DefKind::Const | DefKind::AnonConst) | ||
| } | ||
|
|
||
| fn trivial_const_provider<'tcx>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems worth moving this query impl into a separate file, to avoid growing this already-big file even bigger.
Also please add more comments, in particular explaining the overall contract this query must abide by.
| tcx.ensure_done().coroutine_by_move_body_def_id(def); | ||
| } | ||
|
|
||
| // the `trivial_const` query uses mir_built, so make sure it is run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mir_built also uses trivial_const, so I am confused... this sounds cyclic?
| // Trying to push this logic earlier in the compiler and never even produce the Body would | ||
| // probably improve compile time. | ||
| if def_kind_compatible_with_trivial_mir(tcx, def) && trivial_const(&body).is_some() { | ||
| let body = tcx.alloc_steal_mir(body); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let body = tcx.alloc_steal_mir(body); | |
| // Skip all the passes below for trivial consts. | |
| let body = tcx.alloc_steal_mir(body); |
The objective of this PR is to improve compilation performance for crates that define a lot of trivial consts. This is a flamegraph of a build of a library crate that is just 100,000 trivial consts, taken from a nightly compiler:

My objective is to target all of the cycles in
eval_to_const_value_rawthat are not part ofmir_built, because if you look at themir_builtfor a trivial const, we already have the value available.In this PR, the definition of a trivial const is this:
Specifically, we look for if the
mir_builtbody is a single basic block containing one assign statement and a return terminator, where the assign statement assigns anOperand::Constant(Const::Val). The MIR dumps for these look like:The implementation is built around a new query,
trivial_const(LocalDefId) -> Option<(ConstValue, Ty)>which returns the contents of theConst::Valin themir_builtif theLocalDefIdis a trivial const.Then I added debug assertions to the beginning of
mir_for_ctfeandmir_promotedto prevent trying to get the body of a trivial const, because that would defeat the optimization here. But these are deliberately debug assertions because the consequence of failing the assertion is that compilation is slow, not corrupt. If we made these hard assertions, I'm sure there are obscure scenarios people will run into where the compiler would ICE instead of continuing on compilation, just a bit slower. I'd like to know about those, but I do not think serving up an ICE is worth it.With the assertions in place, I just added logic around all the places they were hit, to skip over trying to analyze the bodies of trivial consts.
In the future, I'd like to see this work extended by:
_1 = const 0_usize; _0 = &_1, which would make a lot of promoteds into trivial constsconst A: usize = B, which haveOperand::Constant(Const::Unevaluated)